Search Results: "osd"

3 May 2021

Russell Coker: DNS, Lots of IPs, and Postal

I decided to start work on repeating the tests for my 2006 OSDC paper on Benchmarking Mail Relays [1] and discover how the last 15 years of hardware developments have changed things. There have been software changes in that time too, but nothing that compares with going from single core 32bit systems with less than 1G of RAM and 60G IDE disks to multi-core 64bit systems with 128G of RAM and SSDs. As an aside the hardware I used in 2006 wasn t cutting edge and the hardware I m using now isn t either. In both cases it s systems I bought second hand for under $1000. Pedants can think of this as comparing 2004 and 2018 hardware. BIND I decided to make some changes to reflect the increased hardware capacity and use 2560 domains and IP addresses, which gave the following errors as well as a startup time of a minute on a system with two E5-2620 CPUs.
May  2 16:38:37 server named[7372]: listening on IPv4 interface lo, 127.0.0.1#53
May  2 16:38:37 server named[7372]: listening on IPv4 interface eno4, 10.0.2.45#53
May  2 16:38:37 server named[7372]: listening on IPv4 interface eno4, 10.0.40.1#53
May  2 16:38:37 server named[7372]: listening on IPv4 interface eno4, 10.0.40.2#53
May  2 16:38:37 server named[7372]: listening on IPv4 interface eno4, 10.0.40.3#53
[...]
May  2 16:39:33 server named[7372]: listening on IPv4 interface eno4, 10.0.47.0#53
May  2 16:39:33 server named[7372]: listening on IPv4 interface eno4, 10.0.48.0#53
May  2 16:39:33 server named[7372]: listening on IPv4 interface eno4, 10.0.49.0#53
May  2 16:39:33 server named[7372]: listening on IPv6 interface lo, ::1#53
[...]
May  2 16:39:36 server named[7372]: zone localhost/IN: loaded serial 2
May  2 16:39:36 server named[7372]: all zones loaded
May  2 16:39:36 server named[7372]: running
May  2 16:39:36 server named[7372]: socket: file descriptor exceeds limit (123273/21000)
May  2 16:39:36 server named[7372]: managed-keys-zone: Unable to fetch DNSKEY set '.': not enough free resources
May  2 16:39:36 server named[7372]: socket: file descriptor exceeds limit (123273/21000)
The first thing I noticed is that a default configuration of BIND with 2560 local IPs (when just running in the default recursive mode) takes a minute to start and needed to open over 100,000 file handles. BIND also had some errors in that configuration which led to it not accepting shutdown requests. I filed Debian bug report #987927 [2] about this. One way of dealing with the errors in this situation on Debian is to edit /etc/default/named and put in the following line to allow BIND to access to many file handles:
OPTIONS="-u bind -S 150000"
But the best thing to do for BIND when there are many IP addresses that aren t going to be used for DNS service is to put a directive like the following in the BIND configuration to specify the IP address or addresses that are used for the DNS service:
listen-on   10.0.2.45;  ;
I have just added the listen-on and listen-on-v6 directives to one of my servers with about a dozen IP addresses. While 2560 IP addresses is an unusual corner case it s not uncommon to have dozens of addresses on one system. dig When doing tests of Postfix for relaying mail I noticed that mail was being deferred with DNS problems (error was Host or domain name not found. Name service error for name=a838.example.com type=MX: Host not found, try again . I tested the DNS lookups with dig which failed with errors like the following:
dig -t mx a704.example.com
socket.c:1740: internal_send: 10.0.2.45#53: Invalid argument
socket.c:1740: internal_send: 10.0.2.45#53: Invalid argument
socket.c:1740: internal_send: 10.0.2.45#53: Invalid argument
; <
> DiG 9.16.13-Debian <
> -t mx a704.example.com
;; global options: +cmd
;; connection timed out; no servers could be reached
Here is a sample of the strace output from tracing dig:
bind(20,  sa_family=AF_INET, sin_port=htons(0), 
sin_addr=inet_addr("0.0.0.0") , 16) = 0
recvmsg(20,  msg_namelen=128 , 0)       = -1 EAGAIN (Resource temporarily 
unavailable)
write(4, "\24\0\0\0\375\377\377\377", 8) = 8
sendmsg(20,  msg_name= sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.0.2.45") , msg_
namelen=16, msg_iov=[ iov_base="86\1 
\0\1\0\0\0\0\0\1\4a704\7example\3com\0\0\17\0\1\0\0)\20\0\0\0\0
\0\0\f\0\n\0\10's\367\265\16bx\354", iov_len=57 ], msg_iovlen=1, 
msg_controllen=0, msg_flags=0 , 0) 
= -1 EINVAL (Invalid argument)
write(2, "socket.c:1740: ", 15)         = 15
write(2, "internal_send: 10.0.2.45#53: Invalid argument", 45) = 45
write(2, "\n", 1)                       = 1
futex(0x7f5a80696084, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x7f5a80696010, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f5a8069809c, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f5a80698020, FUTEX_WAKE_PRIVATE, 1) = 1
sendmsg(20,  msg_name= sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.0.2.45") , msg_namelen=16, msg_iov=[ iov_base="86\1 
\0\1\0\0\0\0\0\1\4a704\7example\3com\0\0\17\0\1\0\0)\20\0\0\0\0\0\0\f\0\n\0\10's\367\265\16bx\354", 
iov_len=57 ], msg_iovlen=1, msg_controllen=0, msg_flags=0 , 0) = -1 EINVAL 
(Invalid argument)
write(2, "socket.c:1740: ", 15)         = 15
write(2, "internal_send: 10.0.2.45#53: Invalid argument", 45) = 45
write(2, "\n", 1)
Ubuntu bug #1702726 claims that an insufficient ARP cache was the cause of dig problems [3]. At the time I encountered the dig problems I was seeing lots of kernel error messages neighbour: arp_cache: neighbor table overflow which I solved by putting the following in /etc/sysctl.d/mine.conf:
net.ipv4.neigh.default.gc_thresh3 = 4096
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh1 = 1024
Making that change (and having rebooted because I didn t need to run the server overnight) didn t entirely solve the problems. I have seen some DNS errors from Postfix since then but they are less common than before. When they happened I didn t have that error from dig. At this stage I m not certain that the ARP change fixed the dig problem although it seems likely (it s always difficult to be certain that you have solved a race condition instead of made it less common or just accidentally changed something else to conceal it). But it is clearly a good thing to have a large enough ARP cache so the above change is probably the right thing for most people (with the possibility of changing the numbers according to the required scale). Also people having that dig error should probably check their kernel message log, if the ARP cache isn t the cause then some other kernel networking issue might be related. Preliminary Results With Postfix I m seeing around 24,000 messages relayed per minute with more than 60% CPU time idle. I m not sure exactly how to count idle time when there are 12 CPU cores and 24 hyper-threads as having only 1 process scheduled for each pair of hyperthreads on a core is very different to having half the CPU cores unused. I ran my script to disable hyper-threads by telling the Linux kernel to disable each processor core that has the same core ID as another, it was buggy and disabled the second CPU altogether (better than finding this out on a production server). Going from 24 hyper-threads of 2 CPUs to 6 non-HT cores of a single CPU didn t change the thoughput and the idle time went to about 30%, so I have possibly halved the CPU capacity for these tasks by disabling all hyper-threads and one entire CPU which is surprising given that I theoretically reduced the CPU power by 75%. I think my focus now has to be on hyper-threading optimisation. Since 2006 the performance has gone from ~20 messages per minute on relatively commodity hardware to 24,000 messages per minute on server equipment that is uncommon for home use but which is also within range of home desktop PCs. I think that a typical desktop PC with a similar speed CPU, 32G of RAM and SSD storage would give the same performance. Moore s Law (that transistor count doubles approximately every 2 years) is often misquoted as having performance double every 2 years. In this case more than 1024* the performance over 15 years means the performance doubling every 18 months. Probably most of that is due to SATA SSDs massively outperforming IDE hard drives but it s still impressive. Notes I ve been using example.com for test purposes for a long time, but RFC2606 specifies .test, .example, and .invalid as reserved top level domains for such things. On the next iteration I ll change my scripts to use .test. My current test setup has a KVM virtual machine running my bhm program to receive mail which is taking between 20% and 50% of a CPU core in my tests so far. While that is happening the kvm process is reported as taking between 60% and 200% of a CPU core, so kvm takes as much as 4* the CPU of the guest due to the virtual networking overhead even though I m using the virtio-net-pci driver (the most efficient form of KVM networking for emulating a regular ethernet card). I ve also seen this in production with a virtual machine running a ToR relay node. I ve fixed a bug where Postal would try to send the SMTP quit command after encountering a TCP error which would cause an infinite loop and SEGV.

28 April 2021

Jonathan McDowell: DeskPi Pro update

I wrote previously about my DeskPi Pro + 8GB Pi 4 setup. My main complaint at the time was the fact one of the forward facing USB ports broke off early on in my testing. For day to day use that hasn t been a problem, but it did mar the whole experience. Last week I received an unexpected email telling me The new updated PCB Board for your DeskPi order was shipped. . Apparently this was due to problems with identifying SSDs and WiFi/HDMI issues. I wasn t quite sure how much of the internals they d be replacing, so I was pleasantly surprised when it turned out to be most of them; including the PCB with the broken USB port on my device. DeskPi Pro replacement PCB They also provided a set of feet allowing for vertical mounting of the device, which was a nice touch. The USB/SATA bridge chip in use has changed; the original was:
usb 2-1: New USB device found, idVendor=152d, idProduct=0562, bcdDevice= 1.09
usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-1: Product: RPi_SSD
usb 2-1: Manufacturer: 52Pi
usb 2-1: SerialNumber: DD5641988389F
and the new one is:
usb 2-1: New USB device found, idVendor=174c, idProduct=1153, bcdDevice= 0.01
usb 2-1: New USB device strings: Mfr=2, Product=3, SerialNumber=1
usb 2-1: Product: AS2115
usb 2-1: Manufacturer: ASMedia
usb 2-1: SerialNumber: 00000000000000000000
That s a move from a JMicron 6Gb/s bridge to an ASMedia 3Gb/s bridge. It seems there are compatibility issues with the JMicron that mean the downgrade is the preferred choice. I haven t retried the original SSD I wanted to use (that wasn t detected), but I did wonder if this might have resolved that issue too. Replacing the PCB was easier than the original install; everything was provided pre-assembled and I just had to unscrew the Pi4 and slot it out, then screw it into the new PCB assembly. Everything booted fine without the need for any configuration tweaks. Nice and dull. I ve tried plugging things into the new USB ports and they seem ok so far as well. However I also then ended up pulling in a new backports kernel from Debian (upgrading from 5.9 to 5.10) which resulted in a failure to boot. The kernel and initramfs were loaded fine, but no login prompt ever appeared. Some digging led to the discovery that a change in boot ordering meant USB was not being enabled. The solution is to add reset_raspberrypi to the /etc/initramfs-tools/modules file - that way this module is available in the initramfs, the appropriate pre-USB reset can happen and everything works just fine again. The other niggle with the new kernel was a regular set of errors in the kernel log:
mmc1: Timeout waiting for hardware cmd interrupt.
mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
and a set of registers afterwards, roughly every 10s or so. This seems to be fallout from an increase in the core clock due to the VC4 driver now being enabled, the fact I have no SD card in the device and a lack of working card-detect line for the MicroSD slot. There s a GitHub issue but I solved it by removing the sdhci_iproc for now - I m not using the wifi so loss of MMC isn t a problem. Credit to DeskPi for how they handled this. I didn t have to do anything and didn t even realise anything was happening until I got the email with my tracking number and a description of what they were sending out in it. Delivery took less than a week. This is a great example of how to handle a product issue - no effort required on the part of the customer.

24 April 2021

Gunnar Wolf: FLISOL Talking about Jitsi

Every year since 2005 there is a very good, big and interesting Latin American gathering of free-software-minded people. Of course, Latin America is a big, big, big place, and it s not like we are the most economically buoyant region to meet in something equiparable to FOSDEM. What we have is a distributed free software conference originally, a distributed Linux install-fest (which I never liked, I am against install-fests), but gradually it morphed into a proper conference: Festival Latinoamericano de Instalaci n de Software Libre (Latin American Free Software Installation Festival) This FLISOL was hosted by the always great and always interesting Rancho Electr nico, our favorite local hacklab, and has many other interesting talks. I like talking about projects where I am involved as a developer but this time I decided to do otherwise: I presented a talk on the Jitsi videoconferencing server. Why? Because of the relevance videoconferences have had over the last year. So, without further ado Here is a video I recorded locally from the talk I gave (MKV), as well as the slides (PDF).

9 April 2021

Michael Prokop: A Ceph war story

It all started with the big bang! We nearly lost 33 of 36 disks on a Proxmox/Ceph Cluster; this is the story of how we recovered them. At the end of 2020, we eventually had a long outstanding maintenance window for taking care of system upgrades at a customer. During this maintenance window, which involved reboots of server systems, the involved Ceph cluster unexpectedly went into a critical state. What was planned to be a few hours of checklist work in the early evening turned out to be an emergency case; let s call it a nightmare (not only because it included a big part of the night). Since we have learned a few things from our post mortem and RCA, it s worth sharing those with others. But first things first, let s step back and clarify what we had to deal with. The system and its upgrade One part of the upgrade included 3 Debian servers (we re calling them server1, server2 and server3 here), running on Proxmox v5 + Debian/stretch with 12 Ceph OSDs each (65.45TB in total), a so-called Proxmox Hyper-Converged Ceph Cluster. First, we went for upgrading the Proxmox v5/stretch system to Proxmox v6/buster, before updating Ceph Luminous v12.2.13 to the latest v14.2 release, supported by Proxmox v6/buster. The Proxmox upgrade included updating corosync from v2 to v3. As part of this upgrade, we had to apply some configuration changes, like adjust ring0 + ring1 address settings and add a mon_host configuration to the Ceph configuration. During the first two servers reboots, we noticed configuration glitches. After fixing those, we went for a reboot of the third server as well. Then we noticed that several Ceph OSDs were unexpectedly down. The NTP service wasn t working as expected after the upgrade. The underlying issue is a race condition of ntp with systemd-timesyncd (see #889290). As a result, we had clock skew problems with Ceph, indicating that the Ceph monitors clocks aren t running in sync (which is essential for proper Ceph operation). We initially assumed that our Ceph OSD failure derived from this clock skew problem, so we took care of it. After yet another round of reboots, to ensure the systems are running all with identical and sane configurations and services, we noticed lots of failing OSDs. This time all but three OSDs (19, 21 and 22) were down:
% sudo ceph osd tree
ID CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF
-1       65.44138 root default
-2       21.81310     host server1
 0   hdd  1.08989         osd.0    down  1.00000 1.00000
 1   hdd  1.08989         osd.1    down  1.00000 1.00000
 2   hdd  1.63539         osd.2    down  1.00000 1.00000
 3   hdd  1.63539         osd.3    down  1.00000 1.00000
 4   hdd  1.63539         osd.4    down  1.00000 1.00000
 5   hdd  1.63539         osd.5    down  1.00000 1.00000
18   hdd  2.18279         osd.18   down  1.00000 1.00000
20   hdd  2.18179         osd.20   down  1.00000 1.00000
28   hdd  2.18179         osd.28   down  1.00000 1.00000
29   hdd  2.18179         osd.29   down  1.00000 1.00000
30   hdd  2.18179         osd.30   down  1.00000 1.00000
31   hdd  2.18179         osd.31   down  1.00000 1.00000
-4       21.81409     host server2
 6   hdd  1.08989         osd.6    down  1.00000 1.00000
 7   hdd  1.08989         osd.7    down  1.00000 1.00000
 8   hdd  1.63539         osd.8    down  1.00000 1.00000
 9   hdd  1.63539         osd.9    down  1.00000 1.00000
10   hdd  1.63539         osd.10   down  1.00000 1.00000
11   hdd  1.63539         osd.11   down  1.00000 1.00000
19   hdd  2.18179         osd.19     up  1.00000 1.00000
21   hdd  2.18279         osd.21     up  1.00000 1.00000
22   hdd  2.18279         osd.22     up  1.00000 1.00000
32   hdd  2.18179         osd.32   down  1.00000 1.00000
33   hdd  2.18179         osd.33   down  1.00000 1.00000
34   hdd  2.18179         osd.34   down  1.00000 1.00000
-3       21.81419     host server3
12   hdd  1.08989         osd.12   down  1.00000 1.00000
13   hdd  1.08989         osd.13   down  1.00000 1.00000
14   hdd  1.63539         osd.14   down  1.00000 1.00000
15   hdd  1.63539         osd.15   down  1.00000 1.00000
16   hdd  1.63539         osd.16   down  1.00000 1.00000
17   hdd  1.63539         osd.17   down  1.00000 1.00000
23   hdd  2.18190         osd.23   down  1.00000 1.00000
24   hdd  2.18279         osd.24   down  1.00000 1.00000
25   hdd  2.18279         osd.25   down  1.00000 1.00000
35   hdd  2.18179         osd.35   down  1.00000 1.00000
36   hdd  2.18179         osd.36   down  1.00000 1.00000
37   hdd  2.18179         osd.37   down  1.00000 1.00000
Our blood pressure increased slightly! Did we just lose all of our cluster? What happened, and how can we get all the other OSDs back? We stumbled upon this beauty in our logs:
kernel: [   73.697957] XFS (sdl1): SB stripe unit sanity check failed
kernel: [   73.698002] XFS (sdl1): Metadata corruption detected at xfs_sb_read_verify+0x10e/0x180 [xfs], xfs_sb block 0xffffffffffffffff
kernel: [   73.698799] XFS (sdl1): Unmount and run xfs_repair
kernel: [   73.699199] XFS (sdl1): First 128 bytes of corrupted metadata buffer:
kernel: [   73.699677] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00  XFSB..........b.
kernel: [   73.700205] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
kernel: [   73.700836] 00000020: 62 44 2b c0 e6 22 40 d7 84 3d e1 cc 65 88 e9 d8  bD+.."@..=..e...
kernel: [   73.701347] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00  ......@.........
kernel: [   73.701770] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02  ................
ceph-disk[4240]: mount: /var/lib/ceph/tmp/mnt.jw367Y: mount(2) system call failed: Structure needs cleaning.
ceph-disk[4240]: ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', u'xfs', '-o', 'noatime,inode64', '--', '/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.cdda39ed-5
ceph/tmp/mnt.jw367Y']' returned non-zero exit status 32
kernel: [   73.702162] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00  ................
kernel: [   73.702550] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00  ...H............
kernel: [   73.702975] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19  ................
kernel: [   73.703373] XFS (sdl1): SB validate failed with error -117.
The same issue was present for the other failing OSDs. We hoped, that the data itself was still there, and only the mounting of the XFS partitions failed. The Ceph cluster was initially installed in 2017 with Ceph jewel/10.2 with the OSDs on filestore (nowadays being a legacy approach to storing objects in Ceph). However, we migrated the disks to bluestore since then (with ceph-disk and not yet via ceph-volume what s being used nowadays). Using ceph-disk introduces these 100MB XFS partitions containing basic metadata for the OSD. Given that we had three working OSDs left, we decided to investigate how to rebuild the failing ones. Some folks on #ceph (thanks T1, ormandj + peetaur!) were kind enough to share how working XFS partitions looked like for them. After creating a backup (via dd), we tried to re-create such an XFS partition on server1. We noticed that even mounting a freshly created XFS partition failed:
synpromika@server1 ~ % sudo mkfs.xfs -f -i size=2048 -m uuid="4568c300-ad83-4288-963e-badcd99bf54f" /dev/sdc1
meta-data=/dev/sdc1              isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=128    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
synpromika@server1 ~ % sudo mount /dev/sdc1 /mnt/ceph-recovery
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000
cache_node_purge: refcount was 1, not zero (node=0x1d3c400)
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x18800/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x18800/0x1000
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x24c00/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x24c00/0x1000
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0xc400/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0xc400/0x1000
releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!found dirty buffer (bulk) on free list!bad magic number
bad magic number
Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000
releasing dirty buffer (bulk) to free list!mount: /mnt/ceph-recovery: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
Ouch. This very much looked related to the actual issue we re seeing. So we tried to execute mkfs.xfs with a bunch of different sunit/swidth settings. Using -d sunit=512 -d swidth=512 at least worked then, so we decided to force its usage in the creation of our OSD XFS partition. This brought us a working XFS partition. Please note, sunit must not be larger than swidth (more on that later!). Then we reconstructed how to restore all the metadata for the OSD (activate.monmap, active, block_uuid, bluefs, ceph_fsid, fsid, keyring, kv_backend, magic, mkfs_done, ready, require_osd_release, systemd, type, whoami). To identify the UUID, we can read the data from ceph --format json osd dump , like this for all our OSDs (Zsh syntax ftw!):
synpromika@server1 ~ % for f in  0..37  ; printf "osd-$f: %s\n" "$(sudo ceph --format json osd dump   jq -r ".osds[]   select(.osd==$f)   .uuid")"
osd-0: 4568c300-ad83-4288-963e-badcd99bf54f
osd-1: e573a17a-ccde-4719-bdf8-eef66903ca4f
osd-2: 0e1b2626-f248-4e7d-9950-f1a46644754e
osd-3: 1ac6a0a2-20ee-4ed8-9f76-d24e900c800c
[...]
Identifying the corresponding raw device for each OSD UUID is possible via:
synpromika@server1 ~ % UUID="4568c300-ad83-4288-963e-badcd99bf54f"
synpromika@server1 ~ % readlink -f /dev/disk/by-partuuid/"$ UUID "
/dev/sdc1
The OSD s key ID can be retrieved via:
synpromika@server1 ~ % OSD_ID=0
synpromika@server1 ~ % sudo ceph auth get osd."$ OSD_ID " -f json 2>/dev/null   jq -r '.[]   .key'
AQCKFpZdm0We[...]
Now we also need to identify the underlying block device:
synpromika@server1 ~ % OSD_ID=0
synpromika@server1 ~ % sudo ceph osd metadata osd."$ OSD_ID " -f json   jq -r '.bluestore_bdev_partition_path'    
/dev/sdc2
With all of this, we reconstructed the keyring, fsid, whoami, block + block_uuid files. All the other files inside the XFS metadata partition are identical on each OSD. So after placing and adjusting the corresponding metadata on the XFS partition for Ceph usage, we got a working OSD hurray! Since we had to fix yet another 32 OSDs, we decided to automate this XFS partitioning and metadata recovery procedure. We had a network share available on /srv/backup for storing backups of existing partition data. On each server, we tested the procedure with one single OSD before iterating over the list of remaining failing OSDs. We started with a shell script on server1, then adjusted the script for server2 and server3. This is the script, as we executed it on the 3rd server. Thanks to this, we managed to get the Ceph cluster up and running again. We didn t want to continue with the Ceph upgrade itself during the night though, as we wanted to know exactly what was going on and why the system behaved like that. Time for RCA! Root Cause Analysis So all but three OSDs on server2 failed, and the problem seems to be related to XFS. Therefore, our starting point for the RCA was, to identify what was different on server2, as compared to server1 + server3. My initial assumption was that this was related to some firmware issues with the involved controller (and as it turned out later, I was right!). The disks were attached as JBOD devices to a ServeRAID M5210 controller (with a stripe size of 512). Firmware state:
synpromika@server1 ~ % sudo storcli64 /c0 show all   grep '^Firmware'
Firmware Package Build = 24.16.0-0092
Firmware Version = 4.660.00-8156
synpromika@server2 ~ % sudo storcli64 /c0 show all   grep '^Firmware'
Firmware Package Build = 24.21.0-0112
Firmware Version = 4.680.00-8489
synpromika@server3 ~ % sudo storcli64 /c0 show all   grep '^Firmware'
Firmware Package Build = 24.16.0-0092
Firmware Version = 4.660.00-8156
This looked very promising, as server2 indeed runs with a different firmware version on the controller. But how so? Well, the motherboard of server2 got replaced by a Lenovo/IBM technician in January 2020, as we had a failing memory slot during a memory upgrade. As part of this procedure, the Lenovo/IBM technician installed the latest firmware versions. According to our documentation, some OSDs were rebuilt (due to the filestore->bluestore migration) in March and April 2020. It turned out that precisely those OSDs were the ones that survived the upgrade. So the surviving drives were created with a different firmware version running on the involved controller. All the other OSDs were created with an older controller firmware. But what difference does this make? Now let s check firmware changelogs. For the 24.21.0-0097 release we found this:
- Cannot create or mount xfs filesystem using xfsprogs 4.19.x kernel 4.20(SCGCQ02027889)
- xfs_info command run on an XFS file system created on a VD of strip size 1M shows sunit and swidth as 0(SCGCQ02056038)
Our XFS problem certainly was related to the controller s firmware. We also recalled that our monitoring system reported different sunit settings for the OSDs that were rebuilt in March and April. For example, OSD 21 was recreated and got different sunit settings:
WARN  server2.example.org  Mount options of /var/lib/ceph/osd/ceph-21      WARN - Missing: sunit=1024, Exceeding: sunit=512
We compared the new OSD 21 with an existing one (OSD 25 on server3):
synpromika@server2 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d21.mount   grep sunit
Options=rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota
synpromika@server3 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d25.mount   grep sunit
Options=rw,noatime,attr2,inode64,sunit=1024,swidth=512,noquota
Thanks to our documentation, we could compare execution logs of their creation:
% diff -u ceph-disk-osd-25.log ceph-disk-osd-21.log
-synpromika@server2 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdj --osd-id 25
+synpromika@server3 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdi --osd-id 21
[...]
-command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdj1
-meta-data=/dev/sdj1              isize=2048   agcount=4, agsize=6272 blks
[...]
+command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdi1
+meta-data=/dev/sdi1              isize=2048   agcount=4, agsize=6336 blks
          =                       sectsz=4096  attr=2, projid32bit=1
          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
-data     =                       bsize=4096   blocks=25088, imaxpct=25
-         =                       sunit=128    swidth=64 blks
+data     =                       bsize=4096   blocks=25344, imaxpct=25
+         =                       sunit=64     swidth=64 blks
 naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
 log      =internal log           bsize=4096   blocks=1608, version=2
          =                       sectsz=4096  sunit=1 blks, lazy-count=1
 realtime =none                   extsz=4096   blocks=0, rtextents=0
[...]
So back then, we even tried to track this down but couldn t make sense of it yet. But now this sounds very much like it is related to the problem we saw with this Ceph/XFS failure. We follow Occam s razor, assuming the simplest explanation is usually the right one, so let s check the disk properties and see what differs:
synpromika@server1 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk
4685545472
2398999281664
512
4096
524288
262144
synpromika@server2 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk
4685545472
2398999281664
512
4096
262144
262144
See the difference between server1 and server2 for identical disks? The getiomin option now reports something different for them:
synpromika@server1 ~ % sudo blockdev --getiomin /dev/sdk            
524288
synpromika@server1 ~ % cat /sys/block/sdk/queue/minimum_io_size
524288
synpromika@server2 ~ % sudo blockdev --getiomin /dev/sdk 
262144
synpromika@server2 ~ % cat /sys/block/sdk/queue/minimum_io_size
262144
It doesn t make sense that the minimum I/O size (iomin, AKA BLKIOMIN) is bigger than the optimal I/O size (ioopt, AKA BLKIOOPT). This leads us to Bug 202127 cannot mount or create xfs on a 597T device, which matches our findings here. But why did this XFS partition work in the past and fails now with the newer kernel version? The XFS behaviour change Now given that we have backups of all the XFS partition, we wanted to track down, a) when this XFS behaviour was introduced, and b) whether, and if so how it would be possible to reuse the XFS partition without having to rebuild it from scratch (e.g. if you would have no working Ceph OSD or backups left). Let s look at such a failing XFS partition with the Grml live system:
root@grml ~ # grml-version
grml64-full 2020.06 Release Codename Ausgehfuahangl [2020-06-24]
root@grml ~ # uname -a
Linux grml 5.6.0-2-amd64 #1 SMP Debian 5.6.14-2 (2020-06-09) x86_64 GNU/Linux
root@grml ~ # grml-hostname grml-2020-06
Setting hostname to grml-2020-06: done
root@grml ~ # exec zsh
root@grml-2020-06 ~ # dpkg -l xfsprogs util-linux
Desired=Unknown/Install/Remove/Purge/Hold
  Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
 / Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
 / Name           Version      Architecture Description
+++-==============-============-============-=========================================
ii  util-linux     2.35.2-4     amd64        miscellaneous system utilities
ii  xfsprogs       5.6.0-1+b2   amd64        Utilities for managing the XFS filesystem
There it s failing, no matter which mount option we try:
root@grml-2020-06 ~ # mount ./sdd1.dd /mnt
mount: /mnt: mount(2) system call failed: Structure needs cleaning.
root@grml-2020-06 ~ # dmesg   tail -30
[...]
[   64.788640] XFS (loop1): SB stripe unit sanity check failed
[   64.788671] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff
[   64.788671] XFS (loop1): Unmount and run xfs_repair
[   64.788672] XFS (loop1): First 128 bytes of corrupted metadata buffer:
[   64.788673] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00  XFSB..........b.
[   64.788674] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   64.788675] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36  2..5S.D..c0..+h6
[   64.788675] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00  ......@.........
[   64.788675] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02  ................
[   64.788676] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00  ................
[   64.788677] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00  ...H............
[   64.788677] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19  ................
[   64.788679] XFS (loop1): SB validate failed with error -117.
root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota ./sdd1.dd /mnt/
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error.
32 root@grml-2020-06 ~ # dmesg   tail -1
[   66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024)
root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=512,swidth=512,noquota ./sdd1.dd /mnt/
mount: /mnt: mount(2) system call failed: Structure needs cleaning.
32 root@grml-2020-06 ~ # dmesg   tail -14
[   66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024)
[   80.751277] XFS (loop1): SB stripe unit sanity check failed
[   80.751323] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff 
[   80.751324] XFS (loop1): Unmount and run xfs_repair
[   80.751325] XFS (loop1): First 128 bytes of corrupted metadata buffer:
[   80.751327] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00  XFSB..........b.
[   80.751328] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   80.751330] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36  2..5S.D..c0..+h6
[   80.751331] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00  ......@.........
[   80.751331] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02  ................
[   80.751332] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00  ................
[   80.751333] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00  ...H............
[   80.751334] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19  ................
[   80.751338] XFS (loop1): SB validate failed with error -117.
Also xfs_repair doesn t help either:
root@grml-2020-06 ~ # xfs_info ./sdd1.dd
meta-data=./sdd1.dd              isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=128    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@grml-2020-06 ~ # xfs_repair ./sdd1.dd
Phase 1 - find and verify superblock...
bad primary superblock - bad stripe width in superblock !!!
attempting to find secondary superblock...
..............................................................................................Sorry, could not find valid secondary superblock
Exiting now.
With the SB stripe unit sanity check failed message, we could easily track this down to the following commit fa4ca9c:
% git show fa4ca9c5574605d1e48b7e617705230a0640b6da   cat
commit fa4ca9c5574605d1e48b7e617705230a0640b6da
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Jun 5 10:06:16 2018 -0700
    
    xfs: catch bad stripe alignment configurations
    
    When stripe alignments are invalid, data alignment algorithms in the
    allocator may not work correctly. Ensure we catch superblocks with
    invalid stripe alignment setups at mount time. These data alignment
    mismatches are now detected at mount time like this:
    
    XFS (loop0): SB stripe unit sanity check failed
    XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff
    XFS (loop0): Unmount and run xfs_repair
    XFS (loop0): First 128 bytes of corrupted metadata buffer:
    0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00  XFSB............
    0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2  .27...F=...3....
    000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0  ................
    0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2  ................
    00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00  ................
    000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00  ... .4..........
    00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19  ................
    XFS (loop0): SB validate failed with error -117.
    
    And the mount fails.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
diff --git fs/xfs/libxfs/xfs_sb.c fs/xfs/libxfs/xfs_sb.c
index b5dca3c8c84d..c06b6fc92966 100644
--- fs/xfs/libxfs/xfs_sb.c
+++ fs/xfs/libxfs/xfs_sb.c
@@ -278,6 +278,22 @@ xfs_mount_validate_sb(
                return -EFSCORRUPTED;
         
        
+       if (sbp->sb_unit)  
+               if (!xfs_sb_version_hasdalign(sbp)  
+                   sbp->sb_unit > sbp->sb_width  
+                   (sbp->sb_width % sbp->sb_unit) != 0)  
+                       xfs_notice(mp, "SB stripe unit sanity check failed");
+                       return -EFSCORRUPTED;
+                 
+         else if (xfs_sb_version_hasdalign(sbp))   
+               xfs_notice(mp, "SB stripe alignment sanity check failed");
+               return -EFSCORRUPTED;
+         else if (sbp->sb_width)  
+               xfs_notice(mp, "SB stripe width sanity check failed");
+               return -EFSCORRUPTED;
+        
+
+       
        if (xfs_sb_version_hascrc(&mp->m_sb) &&
            sbp->sb_blocksize < XFS_MIN_CRC_BLOCKSIZE)  
                xfs_notice(mp, "v5 SB sanity check failed");
This change is included in kernel versions 4.18-rc1 and newer:
% git describe --contains fa4ca9c5574605d1e48
v4.18-rc1~37^2~14
Now let s try with an older kernel version (4.9.0), using old Grml 2017.05 release:
root@grml ~ # grml-version
grml64-small 2017.05 Release Codename Freedatensuppe [2017-05-31]
root@grml ~ # uname -a
Linux grml 4.9.0-1-grml-amd64 #1 SMP Debian 4.9.29-1+grml.1 (2017-05-24) x86_64 GNU/Linux
root@grml ~ # lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 9.0 (stretch)
Release:        9.0
Codename:       stretch
root@grml ~ # grml-hostname grml-2017-05
Setting hostname to grml-2017-05: done
root@grml ~ # exec zsh
root@grml-2017-05 ~ #
root@grml-2017-05 ~ # xfs_info ./sdd1.dd
xfs_info: ./sdd1.dd is not a mounted XFS filesystem
1 root@grml-2017-05 ~ # xfs_repair ./sdd1.dd
Phase 1 - find and verify superblock...
bad primary superblock - bad stripe width in superblock !!!
attempting to find secondary superblock...
..............................................................................................Sorry, could not find valid secondary superblock
Exiting now.
1 root@grml-2017-05 ~ # mount ./sdd1.dd /mnt
root@grml-2017-05 ~ # mount -t xfs
/root/sdd1.dd on /mnt type xfs (rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota)
root@grml-2017-05 ~ # ls /mnt
activate.monmap  active  block  block_uuid  bluefs  ceph_fsid  fsid  keyring  kv_backend  magic  mkfs_done  ready  require_osd_release  systemd  type  whoami
root@grml-2017-05 ~ # xfs_info /mnt
meta-data=/dev/loop1             isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=128    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Mounting there indeed works! Now, if we mount the filesystem with new and proper sunit/swidth settings using the older kernel, it should rewrite them on disk:
root@grml-2017-05 ~ # mount -t xfs -o sunit=512,swidth=512 ./sdd1.dd /mnt/
root@grml-2017-05 ~ # umount /mnt/
And indeed, mounting this rewritten filesystem then also works with newer kernels:
root@grml-2020-06 ~ # mount ./sdd1.rewritten /mnt/
root@grml-2020-06 ~ # xfs_info /root/sdd1.rewritten
meta-data=/dev/loop1             isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=64    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@grml-2020-06 ~ # mount -t xfs                
/root/sdd1.rewritten on /mnt type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota)
FTR: The sunit=512,swidth=512 from the xfs mount option is identical to xfs_info s output sunit=64,swidth=64 (because mount.xfs s sunit value is given in 512-byte block units, see man 5 xfs, and the xfs_info output reported here is in blocks with a block size (bsize) of 4096, so sunit = 512*512 := 64*4096 ). mkfs uses minimum and optimal sizes for stripe unit and stripe width; you can check this e.g. via (note that server2 with fixed firmware version reports proper values, whereas server3 with broken controller firmware reports non-sense):
synpromika@server2 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done
[...]
/sys/block/sdc/queue/: 262144 262144
/sys/block/sdd/queue/: 262144 262144
/sys/block/sde/queue/: 262144 262144
/sys/block/sdf/queue/: 262144 262144
/sys/block/sdg/queue/: 262144 262144
/sys/block/sdh/queue/: 262144 262144
/sys/block/sdi/queue/: 262144 262144
/sys/block/sdj/queue/: 262144 262144
/sys/block/sdk/queue/: 262144 262144
/sys/block/sdl/queue/: 262144 262144
/sys/block/sdm/queue/: 262144 262144
/sys/block/sdn/queue/: 262144 262144
[...]
synpromika@server3 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done
[...]
/sys/block/sdc/queue/: 524288 262144
/sys/block/sdd/queue/: 524288 262144
/sys/block/sde/queue/: 524288 262144
/sys/block/sdf/queue/: 524288 262144
/sys/block/sdg/queue/: 524288 262144
/sys/block/sdh/queue/: 524288 262144
/sys/block/sdi/queue/: 524288 262144
/sys/block/sdj/queue/: 524288 262144
/sys/block/sdk/queue/: 524288 262144
/sys/block/sdl/queue/: 524288 262144
/sys/block/sdm/queue/: 524288 262144
/sys/block/sdn/queue/: 524288 262144
[...]
This is the underlying reason why the initially created XFS partitions were created with incorrect sunit/swidth settings. The broken firmware of server1 and server3 was the cause of the incorrect settings they were ignored by old(er) xfs/kernel versions, but treated as an error by new ones. Make sure to also read the XFS FAQ regarding How to calculate the correct sunit,swidth values for optimal performance . We also stumbled upon two interesting reads in RedHat s knowledge base: 5075561 + 2150101 (requires an active subscription, though) and #1835947. Am I affected? How to work around it? To check whether your XFS mount points are affected by this issue, the following command line should be useful:
awk '$3 == "xfs" print $2 ' /proc/self/mounts   while read mount ; do echo -n "$mount " ; xfs_info $mount   awk '$0 ~ "swidth" gsub(/.*=/,"",$2); gsub(/.*=/,"",$3); print $2,$3 '   awk '  if ($1 > $2) print "impacted"; else print "OK" ' ; done
If you run into the above situation, the only known solution to get your original XFS partition working again, is to boot into an older kernel version again (4.17 or older), mount the XFS partition with correct sunit/swidth settings and then boot back into your new system (kernel version wise). Lessons learned Thanks: Darshaka Pathirana, Chris Hofstaedtler and Michael Hanscho. Looking for help with your IT infrastructure? Let us know!

22 March 2021

Wouter Verhelst: Twenty years of Debian

Ten years ago, I reflected on the fact that -- by that time -- I had been in Debian for just over ten years. This year, in early February, I've passed the twenty year milestone. As I'm turning 43 this year, I will have been in Debian for half my life in about three years. Scary thought, that. In the past ten years, not much has changed, and yet at the same time, much has. I became involved in the Debian video team; I stepped down from the m68k port; and my organizing of the Debian devroom at FOSDEM resulted in me eventually joining the FOSDEM orga team, where I eventually ended up also doing video. As part of my video work, I wrote SReview, for which in these COVID-19 times in much of my spare time I have had to write new code and/or fix bugs. I was a candidate for the position of DPL one more time, without being elected. I was also a candidate for the technical committee a few times, also without success. I also added a few packages to the list of packages that I maintain for Debian; most obviously this includes SReview, but there's also things like extrepo and policy-rcd-declarative, both fairly recent packages that I hope will improve Debian as a whole in the longer term. On a more personal level, at one debconf I met a wonderful girl that I now have just celebrated my first wedding anniversary with. Before that could happen, I have had to move to South Africa two years ago. Moving is an involved process at any one time; moving to a different continent altogether is even more so. As it would have been complicated and involved to remain a business owner of a Belgian business while living 9500km away from the country, I sold my shares to my (now ex) business partner; it turned the page of a 15-year chapter of my life, something I could not do without feelings one way or the other. The things I do in Debian has changed over the past twenty years. I've been the maintainer of the second-highest number of packages in the project when I maintained the Linux Gazette packages; I've been an m68k porter; I've been an AM, and briefly even an NM frontdesk member; I've been a DPL candidate three times, and a TC candidate twice. At the turn of my first decade of being a Debian Developer, I noted that people started to recognize my name, and that I started to be one of the Debian Developers who had been with the project longer than most. This has, obviously, not changed. New in the "I'm getting old" department is the fact that during the last Debconf, I noticed for the first time that there was a speaker who had been alive for less long than I had been a Debian Developer. I'm assuming these types of things will continue happening in the next decade, and that the future will bring more of these kinds of changes that will make me feel older as I and the project mature more. I'm looking forward to it. Here's to you, Debian; may you continue to influence my life, in good ways and in bad (but hopefully mostly good), as well as continue to inspire me to improve the world, as you have over the past twenty years!

1 March 2021

Paul Wise: FLOSS Activities February 2021

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration
  • Debian: fix permissions for XMPP anti-spam git
  • Debian wiki: workaround moin bug with deleting deprecated pages, unblock IP addresses, approve accounts

Communication
  • Respond to queries from Debian users and contributors on the mailing lists and IRC
  • Edited and sent Debian DevNews #54

Sponsors The purple-discord/harmony/librecaptcha/libemail-outlook-message-perl work was sponsored by my employer. All other work was done on a volunteer basis.

26 January 2021

Gunnar Wolf: Back to school... As a student

Although it was a much larger step when I made a similar announcement seven years ago, when I started my Specialization, it is still a big challenge ahead, and I am very happy to pursue this: I have been admitted to a PhD program at UNAM, the university I have worked at for almost 20 years, and one of the top universities in Latin America. What program will I be part of? Doctorado en Ciencia e Ingenier a de la Computaci n (Computer Science and Engineering Doctorate Quite a broad program name, yes, sounds like anything goes). I am happy to say I managed to do as I hoped seven years ago. As that blog post says, I managed to keep an eye on my keyring-maint duties as well Will even try to link that work with what I do at school. Over the years I spent pursuing my Specialization and Masters degrees at IPN ESIME, I managed to publish two academic papers on the keyring-maint work: Strengthening a Curated Web of Trust in a Geographically Distributed Project and Insights on the large-scale deployment of a curated Web-of-Trust: the Debian project s cryptographic keyring. Since that time, several relevant things have happened. Mainly, the SKS Keyserver panorama started looking quite bleak: Various attacks such as the poisoned certificates or *certificate flooding have been mounted against the keyserver network, having as a direct outcome the questioning of the future of the decentralized transitional trust model we take for granted in the OpenPGP world. The global SKS keyserver network has quickly eroded, and its continued functioning is no longer something we can take as a given. Different methods have come up, attempting to answer to this situation, such as WKD and DANE, but they all lose something that can be seen as the essence, almost the heart of the distributed, decentralized Web-of-Trust paradigm: The ability to carry the full certificates for the keys. And that s the problem I will try to tackle with my work: How can we, in the light of the known weaknesses, keep a working, decentralized, distributed trust scheme?

17 January 2021

Wouter Verhelst: SReview 0.6

... isn't ready yet, but it's getting there. I had planned to release a new version of SReview, my online video review and transcoding system that I wrote originally for FOSDEM but is being used for DebConf, too, after it was set up and running properly for FOSDEM 2020. However, things got a bit busy (both in my personal life and in the world at large), so it fell a bit by the wayside. I've now also been working on things a bit more, in preparation for an improved administrator's interface, and have started implementing a REST API to deal with talks etc through HTTP calls. This seems to be coming along nicely, thanks to OpenAPI and the Mojolicious plugin for parsing that. I can now design the API nicely, and autogenerate client side libraries to call them. While at it, because libmojolicious-plugin-openapi-perl isn't available in Debian 10 "buster", I moved the docker containers over from stable to testing. This revealed that both bs1770gain and inkscape changed their command line incompatibly, resulting in me having to work around those incompatibilities. The good news is that I managed to do so in a way that keeps running SReview on Debian 10 viable, provided one installs Mojolicious::Plugin::OpenAPI from CPAN rather than from a Debian package. Or installs a backport of that package, of course. Or, heck, uses the Docker containers in a kubernetes environment or some such -- I'd love to see someone use that in production. Anyway, I'm still finishing the API, and the implementation of that API and the test suite that ensures the API works correctly, but progress is happening; and as soon as things seem to be working properly, I'll do a release of SReview 0.6, and will upload that to Debian. Hopefully that'll be soon.

Wouter Verhelst: Dear Google

... Why do you have to be so effing difficult about a YouTube API project that is used for a single event per year? FOSDEM creates 600+ videos on a yearly basis. There is no way I am going to manually upload 600+ videos through your webinterface, so we use the API you provide, using a script written by Stefano Rivera. This script grabs video filenames and metadata from a YAML file, and then uses your APIs to upload said videos with said metadata. It works quite well. I run it from cron, and it uploads files until the quota is exhausted, then waits until the next time the cron job runs. It runs so well, that the first time we used it, we could upload 50+ videos on a daily basis, and so the uploads were done as soon as all the videos were created, which was a few months after the event. Cool! The second time we used the script, it did not work at all. We asked one of our key note speakers who happened to be some hotshot at your company, to help us out. He contacted the YouTube people, and whatever had been broken was quickly fixed, so yay, uploads worked again. I found out later that this is actually a normal thing if you don't use your API quota for 90 days or more. Because it's happened to us every bloody year. For the 2020 event, rather than going through back channels (which happened to be unavailable this edition), I tried to use your normal ways of unblocking the API project. This involves creating a screencast of a bloody command line script and describing various things that don't apply to FOSDEM and ghaah shoot me now so meh, I created a new API project instead, and had the uploads go through that. Doing so gives me a limited quota that only allows about 5 or 6 videos per day, but that's fine, it gives people subscribed to our channel the time to actually watch all the videos while they're being uploaded, rather than being presented with a boatload of videos that they can never watch in a day. Also it doesn't overload subscribers, so yay. About three months ago, I started uploading videos. Since then, every day, the "fosdemtalks" channel on YouTube has published five or six videos. Given that, imagine my surprise when I found this in my mailbox this morning... Google lies, claiming that my YouTube API project isn't being used for 90 days and informing me that it will be disabled This is an outright lie, Google. The project has been created 90 days ago, yes, that's correct. It has been used every day since then to upload videos. I guess that means I'll have to deal with your broken automatic content filters to try and get stuff unblocked... ... or I could just give up and not do this anymore. After all, all the FOSDEM content is available on our public video host, too.

15 January 2021

Mike Gabriel: UBports: Packaging of Lomiri Operating Environment for Debian (part 04)

Before and during FOSDEM 2020, I agreed with the people (developers, supporters, managers) of the UBports Foundation to package the Unity8 Operating Environment for Debian. Since 27th Feb 2020, Unity8 has now become Lomiri. Things got delayed a little recently as my main developer contact on the upstream side was on sick leave for a while. Fortunately, he has now fully recovered and work is getting back on track. Recent Uploads to Debian related to Lomiri Over the past 3 months I worked on the following bits and pieces regarding Lomiri in Debian: The next projects / packages ahead are lomiri-ui-toolkit, integrating changes required for Lomiri into Ayatana Indicators and then moving on with several other, smaller packages. Credits Many big thanks go to Marius and Dalton for their work on the UBports project and being always available for questions, feedback, etc. Thanks to Florian Leeber for being my point of contact for topcis regarding my cooperation with the UBports Foundation.

9 January 2021

Jonathan McDowell: Free Software Activities for 2020

As a reader of Planet Debian I see a bunch of updates at the start of each month about what people are up to in terms of their Free Software activities. I m not generally active enough in the Free Software world to justify a monthly report, but I did a report of my Free Software Activities for 2019 and thought I d do another for 2020. I ended up not doing as much as last year; I put a lot of that down to fatigue about the state of the world and generally not wanting to spend time on the computer at the end of the working day.

Conferences 2020 was unsurprisingly not a great year for conference attendance. I was fortunate enough to make it to FOSDEM and CopyleftConf 2020 - I didn t speak at either, but had plenty of interesting hallway track conversations as well as seeing some good talks. I hadn t been planning to attend DebConf20 due to time constraints, but its move to an entirely online conference meant I was able to attend a few talks at least. I have to say I don t like virtual conferences as much as the real thing; it s not as easy to have the casual chats at them, and it s also harder to carve out the exclusive time when you re at home. That said I spoke at NIDevConf this year, which was also fully virtual. It s not a Free Software focussed conference, but there s a lot of crossover in terms of technologies and I spoke on my experiences with Go, some of which are influenced by my packaging experiences within Debian.

Debian Most of my contributions to Free software happen within Debian. As part of the Data Protection Team I responded to various inbound queries to that team. Some of this involved chasing up other project teams who had been slow to respond - folks, if you re running a service that stores personal data about people then you need to be responsive to requests about it. The Debian Keyring was possibly my largest single point of contribution. We re in a roughly 3 month rotation of who handles the keyring updates, and I handled 2020.02.02, 2020.03.24, 2020.06.24, 2020.09.24 + 2020.12.24 For Debian New Members I m mostly inactive as an application manager - we generally seem to have enough available recently. If that changes I ll look at stepping in to help, but I don t see that happening. I continue to be involved in Front Desk, having various conversations throughout the year with the rest of the team, but there s no doubt Mattia and Pierre-Elliott are the real doers at present. In terms of package uploads I continued to work on gcc-xtensa-lx106, largely doing uploads to deal with updates to the GCC version or packaging (5, 6 + 7). sigrok had a few minor updates, libsigkrok 0.5.2-2, libsigrokdecode 0.5.3-2 as well as a new upstream release of Pulseview 0.4.2-1 and a fix to cope with change to QT 0.4.2-2. Due to the sigrok-firmware requirement on sdcc I also continued to help out there, updating to 4.0.0+dfsg-1 and doing some fixups in 4.0.0+dfsg-2. Despite still not writing an VHDL these days I continue to try and make sure ghdl is ok, because I found it a useful tool in the past. In 2020 that meant a new upstream release, 0.37+dfsg-1 along with a couple of more minor updates (0.37+dfsg-2 + 0.37+dfsg-3. libcli had a new upstream release, 1.10.4-1, and I did a long overdue update to sendip to the latest upstream release, 2.6-1 having been poked about an outstanding bug by the Reproducible Builds folk. OpenOCD is coming up to 4 years since its last stable release, but I did a snapshot upload to Debian experimental (0.10.0+g20200530-1) and a subsequent one to unstable (0.10.0+g20200819-1). There are also moves to produce a 0.11.0 release and I uploaded 0.11.0~rc1-1 as a result. libjaylink got a bump as a result (0.2.0-1) after some discussion with upstream.

OpenOCD On the subject of OpenOCD I ve tried to be a bit more involved upstream. I m not familiar enough with the intricacies of JTAG/SWD/the various architectures supported to contribute to the core, but I pushed the config for my HIE JTAG adapter upstream and try and review patches that don t require in depth hardware knowledge.

Linux I ve been contributing to the Linux kernel for a number of years now, mostly just minor bits here and there for issues I hit. This year I spent a lot of time getting support for the MikoTik RB3011 router upstreamed. That included the basic DTS addition, fixing up QCA8K to support SGMII CPU connections, adding proper 802.1q VLAN support to QCA8K and cleaning up an existing QCOM ADM driver that s required for the NAND. There were a number of associated bugfixes/minor changes found along the way too. It can be a little frustrating at times going round the review loop with submitting things upstream, but I do find it quite satisfying when it all comes together and I have no interest in weird vendor trees that just bitrot over time.

Software in the Public Interest I haven t sat on the board of SPI since 2015 but I was still acting as the primary maintainer of the membership website (with Martin Michlmayr as the other active contributor) and hosting it on my own machine. I managed to finally extricate myself from this role in August. I remain a contributing member.

Personal projects 2020 finally saw another release (0.6.0, followed swiftly by 0.6.1 to allow the upload of 0.6.1-1 to Debian) of onak. This release finally adds various improvements to deal with the hostility shown to the OpenPGP keyserver network in recent years, including full signature verification as an option. I fixed an oversight in my Digoo/1-wire temperature decoder and a bug that turned up on ARM but not MIPS in my mqtt-arp code. I should probably package it for Debian (even if I don t upload it), as I m running it on my RB3011 now.

26 December 2020

Paul Tagliamonte: Reverse Engineering my Christmas Tree

Over the course of the last year and a half, I ve been doing some self-directed learning on how radios work. I ve gone from a very basic understanding of wireless communications (there s usually some sort of antenna, I guess?) all the way through the process of learning about and implementing a set of libraries to modulate and demodulate data using my now formidable stash of SDRs. I ve been implementing all of the RF processing code from first principals and purely based on other primitives I ve written myself to prove to myself that I understand each concept before moving on. I figured that there was a fun capstone to be done here - the blind reverse engineering and implementation of the protocol my cheep Amazon power switch uses to turn on and off my Christmas Tree. All the work described in this post was done over the course of a few hours thanks to help during the demodulation from Tom Bereknyei and hlieberman.

Going in blind When I first got my switch, I checked it for any FCC markings in order to look up the FCC filings to determine the operational frequency of the device, and maybe some other information such as declared modulation or maybe even part numbers and/or diagrams. However, beyond a few regulatory stickers, there were no FCC ids or other distinguishing IDs on the device. Worse yet, it appeared to be a whitelabeled version of another product, so searching Google for the product name was very unhelpful. Since operation of this device is unlicensed, I figured I d start looking in the ISM band. The most common band used that I ve seen is the band starting at 433.05MHz up to 434.79MHz. I fired up my trusty waterfall tuned to a center frequency of 433.92MHz (since it s right in the middle of the band, and it let me see far enough up and down the band to spot the remote) and pressed a few buttons. Imagine my surprise when I realize the operational frequency of this device is 433.920MHz, exactly dead center. Weird, but lucky! After taking a capture, I started to look at understanding what the modulation type of the signal was, and how I may go about demodulating it. Using inspectrum, I was able to clearly see the signal in the capture, and it immediately stuck out to my eye to be encoded using OOK / ASK. Next, I started to measure the smallest pulse, and see if I could infer the symbols per second, and try to decode it by hand. These types of signals are generally pretty easy to decode by eye. This wound up giving me symbol rate of 2.2 Ksym/s, which is a lot faster than I expected. While I was working by hand, Tom demodulated a few messages in Python, and noticed that if you grouped the bits into groups of 4, you either had a 1000 or a 1110 which caused me to realize this was encoded using something I saw documented elsewhere, where the 0 is a short pulse, and a 1 is a long pulse, not unlike morse code, but where each symbol takes up a fixed length of time (monospace morse code?). Working on that assumption, I changed my inspectrum symbol width, and demodulated a few more by hand. This wound up demodulating nicely (and the preamble / clock sync could be represented as repeating 0s, which is handy!) and gave us a symbol rate of 612(ish) symbols per second a lot closer to what I was expecting. If we take the code for on in the inspectrum capture above and demodulate it by hand, we get 0000000000110101100100010 (treat a short pulse as a 0, and a long pulse as a 1). If you re interested in following along at home, click on the inspectrum image, and write down the bits you see, and compare it to what I have! Right, so it looks like from what we can tell so far that the packet looks something like this:
preamble / sync
stuff
Next, I took a capture of all the button presses and demodulated them by hand, and put them into a table to try and understand the format of the messages:
Button Demod'd Bits
On 0000000000110101100100010
Off 00000000001101011001010000
Dim Up 0000000000110101100110100
Dim Down 0000000000110101100100100
Timer 1h 0000000000110101100110010
Timer 2h 0000000000110101100100110
Timer 4h 0000000000110101100100000
Dim 100% 0000000000110101000101010
Dim 75% 00000000001101010001001100
Dim 50% 00000000001101010001001000
Dim 25% 0000000000110101000100000
Great! So, this is enough to attempt to control the tree with, I think so I wrote a simple modulator. My approach was to use the fact that I can break down a single symbol into 4 sub-symbol components which is to say, go back to representing a 1 as 1110, and a 0 as 1000. This let me allocate IQ space for the symbol, break the bit into 4 symbols, and if that symbol is 1, write out values from a carrier wave (cos in the real values, and sin in the imaginary values) to the buffer. Now that I can go from bits to IQ data, I can transmit that IQ data using my PlutoSDR or HackRF and try and control my tree. I gave it a try, and the tree blinked off! Success! But wait that s not enough for me I know I can t just demodulate bits and try and replay the bits forever there s stuff like addresses and keys and stuff, and I want to get a second one of these working. Let s take a look at the bits to see if we spot anything fun & interesting. At first glance, a few things jumped out at me as being weird? First is that the preamble is 10 bits long (fine, let s move along - maybe it just needs 8 in a row and there s two to ensure clocks sync?). Next is that the messages are not all the same length. I double (and triple!) checked the messages, and it s true, the messages are not all the same length. Adding an extra bit at the end didn t break anything, but I wonder if that s just due to the implementation rather than the protocol. But, good news, it looks like we have a stable prefix to the messages from the remote must be my device s address! The stable 6 bits that jump out right away are 110101. Something seems weird, though, 6 bits is a bit awkward, even for a bit limited embedded device. Why 6? But hey, wait, we had 10 bits in the preamble, what if we have an 8 bit address meaning my device is 00110101, and the preamble is 8 0 symbols! Those are numbers that someone working on an 8 bit aligned platform would pick! To test this, I added a 0 to the preamble to see if the message starts at the first 1, or if it requires all the bits to be fully decoded, and lo and behold, the tree did not turn on or off. This would seem to me to confirm that the 0s are part of the address, and I can assume we have two 8 bit aligned bytes in the prefix of the message.
preamble / sync
address
stuff
Now, when we go through the 9-10 bits of stuff , we see all sorts of weird bits floating all over the place. The first 4 bits look like it s either 1001 or 0001, but other than that, there s a lot of chaos. This is where things get really squishy. I needed more information to try and figure this out, but no matter how many times I sent a command it was always the same bits (so, no counters), and things feel very opaque still. The only way I was going to make any progress is to get another switch and see how the messages from the remote change. Off to Amazon I went, and ordered another switch from the same page, and eagerly waited its arrival.

Switch #2 The second switch showed up, and I hurriedly unboxed the kit, put batteries into the remote, and fired up my SDR to take a capture. After I captured the first button ( Off ), my heart sunk as I saw my lights connected to Switch #1 flicker off. Apparently the new switch and the old switch have the same exact address. To be sure, I demodulated the messages as before, and came out with the exact same bit pattern. This is a setback and letdown I was hoping to independently control my switches, but it also means I got no additional information about the address or button format. The upside to all of this, though, is that because the switches are controlled by either remote, I only needed one remote, so why not pull it apart and see if I can figure out what components it s using to transmit, and find any datasheets I can. The PCB was super simple, and I wound up finding a WL116SC IC on the PCB. After some googling, I found a single lone datasheet, entirely in Chinese. Thankfully, Google Translate seems to have worked well enough on technical words, and I was able to put together at least a little bit of understanding based on the documentation that was made available. I took a few screenshots below - I put the google translated text above the hanzi. From that sheet, we can see we got the basics of the 1 and 0 symbol encoding right (I was halfway expecting the bits to be flipped), and a huge find by way of a description of the bits in the message! It s a bummer that we missed the clock sync / preamble pulse before the data message, but that s OK somehow. It also turns out that 8 or 10 bit series of of 0"s wasn t clock sync at all - it was part of the address! Since it also turns out that all devices made by this manufacturer have the hardcoded address of []byte 0x00, 0x35 , that means that the vast majority of bits sent are always going to be the same for any button press on any remote made by this vendor. Seems like a waste of bits to me, but hey, what do I know. Additionally, this also tells us the trailing zeros are not part of the data encoding scheme, which is progress!
address
keycode
Now, working on the assumptions validated by the datasheet, here s the updated list of scancodes we ve found:
Button Scancode Bits Integer
On 10010001 145 / 0x91
Off 10010100 148 / 0x94
Dim Up 10011010 154 / 0x9A
Dim Down 10010010 146 / 0x92
Timer 1h 10011001 154 / 0x99
Timer 2h 10010011 147 / 0x93
Timer 4h 10010000 144 / 0x90
Dim 100% 00010101 21 / 0x15
Dim 75% 00010011 19 / 0x13
Dim 50% 00010010 18 / 0x12
Dim 25% 00010000 16 / 0x10
Interestingly, I think the Dim keys may have a confirmation that we have a good demod the codes on the bottom are missing the most significant bit, and when I look back at the scancode table in the datasheet, they make an interesting pattern the bottom two rows, right and left side values match up! If you take a look, Dim 100% is S1 , Dim 75% is S19 , Dim 50% is S8 , and Dim 25% is S20 . Cool! Since none of the other codes line up, I am willing to bet the most significant bit is a Combo indicator, and not part of the button (leaving 7 bits for the keycode). And even more interestingly, one of our scancodes ( Off , which is 0x94) shows up just below this table, in the examples. Over all, I think this tells us we have the right bits to look at for determining the scan code! Great news there!

Back to the modulation! So, armed with this knowledge, I was able to refactor my code to match the timings and understanding outlined by the datasheet and ensure things still work. The switch itself has a high degree of tolerance, so being wildly off frequency or a wildly wrong symbol rate may actually still work. It s hard to know if this is more or less correct, but matching documentation seems like a more stable foundation if nothing else. This code has been really reliable, and tends to work just as well as the remote from what I ve been able to determine. I ve been using incredibly low power to avoid any interference, and it s been very robust - a testament to the engineering that went into the outlet hardware, even though it cost less than of a lot of other switches! I have a lot of respect for the folks who built this device - it s incredibly simple, reliable and my guess is this thing will keep working even in some fairly harsh RF environments. The only downside is the fact the manufacturer used the same address for all their devices, rather than programming a unique address for each outlet and remote when the underlying WL116SC chip supports it. I m sure this was done to avoid complexity in assembly (e.g. pairing the remote and outlet, and having to keep those two items together during assembly), but it s still a bummer. I took apart the switch to see if I could dump an EEPROM and change the address in ROM, but the entire thing was potted in waterproof epoxy, which is a very nice feature if this was ever used outdoors. Not good news for tinkering, though!

Unsolved Mysteries At this point, even though I understand the protocol enough to control the device, it still feels like I hit a dead end in my understanding. I m not able to figure out how exactly the scancodes are implemented, and break them down into more specific parts. They are stable and based on the physical wiring of the remote, so I think I m going to leave it a magic number. I have what I was looking for, and these magic constants appear to be the right one to use, even if I did understand how to create the codes itself. This does leave us with a few bits we never resolved, which I ll memorialize below just to be sure I don t forget about them. Question #1: According to the datasheet there should be a preamble. Why do I not see one leading the first message? My hunch is that the trailing 0 at the end of the payload is actually just the preamble for the next message (always rendering the first message invalid?). This would let us claim there s an engineering reason why we are ignoring the weird bit, and also explain away something from the documentation. It s just weird that it wouldn t be present on the first message. This theory is mostly confirmed by measuring the timing and comparing it to the datasheet, but it s not exactly in line with the datasheet timings either (specifically, it s off by 200 s, which is kinda a lot for a system using 400 s timings). I think I could go either way on the last 0 being the preamble for the next message. It could be that the first message is technically invalid, or it could also be that this was not implemented or actively disabled by the vendor for this specific application / device. It s really hard to know without getting the source code for the WL116SC chip in this specific remote or the source in the outlet itself. Question #2: Why are some keycodes 8 bits and others 9 bits? I still have no idea why there sometimes 8 bits (for instance, On ) and other times there are 9 bits (for instance, Off ) in the 8 bit keycode field. I spent some time playing with the trailing zeros, when I try and send an Off with the most significant 8 bits (without the least significant / last 9th bit, which is a 0 ), it does not turn the tree off. If I send an On with 9 bits (an additional 0 after the least significant bit), it does work, but both On and Off work when I send 10, 11 or 12 bits padded with trailing zeros. I suspect my outlet will ignore data after the switch is done reading bits regardless of trailing zeros. The docs tell me there should only be 8 bits, but it won t work unless I send 9 bits for some commands. There s something fishy going on here, and the datasheet isn t exactly right either way. Question #3: How in the heck do those scancodes work? This one drove me nuts. I ve spent countless hours on trying to figure this out, including emailing the company that makes the WL116SC (they re really nice!), and even though they were super kind and generous with documentation and example source, I m still having a hard time lining up their documentation and examples with what I see from my remote. I think the manufacturer of my remote and switch has modified the protocol enough to where there s actually something different going on here. Bummer. I wound up in my place of last resort asking friends over Signal to try and see if they could find a pattern, as well as making multiple pleas to the twittersphere, to no avail (but thank you to Ben Hilburn, devnulling, Andreas Bombe and Larme for your repiles, help and advice!) I still don t understand how they assemble the scan code for instance, if you merely add, you won t know if a key press of 0x05 is 0x03 + 0x02 or if it s 0x01 + 0x04. On the other hand, treating it as two 4-bit integers won t work for 0x10 to 0x15 (since they need 5 bits to represent). It s also likely the most significant bit is a combo indicator, which only leaves 7 bits for the actual keypress data. Stuffing 10 bits of data into 7 bits is likely resulting in some really intricate bit work. On a last ditch whim, I tried to XOR the math into working, but some initial brute forcing to make the math work given the provided examples did not result in anything. It could be a bitpacked field that I don t understand, but I don t think I can make progress on that without inside knowledge and much more work. Here s the table containing the numbers I was working off of:
Keys Key Codes Scancode
S3 + S9 0x01 + 0x03 0x96
S6 + S12 0x07 + 0x09 0x94
S22 + S10 0x0D + 0x0F 0x3F
If anyone has thoughts on how these codes work, I d love to hear about it! Send me an email or a tweet or something - I m a bit stumped. There s some trick here that is being used to encode the combo key in a way that is decodeable. If it s actually not decodeable (which is a real possibility!), this may act as a unique button combo hash which allows the receiver to not actually determine which keys are pressed, but have a unique button that gets sent when a combo is used. I m not sure I know enough to have a theory as to which it may be.

16 December 2020

Jonathan McDowell: DeskPi Pro + 8GB Pi 4

DeskPi Pro Raspberry Pi case Despite having worked on a number of ARM platforms I ve never actually had an ARM based development box at home. I have a Raspberry Pi B Classic (the original 256MB rev 0002 variant) a coworker gave me some years ago, but it s not what you d choose for a build machine and generally gets used as a self contained TFTP/console server for hooking up to devices under test. Mostly I ve been able to do kernel development with the cross compilers already built as part of Debian, and either use pre-built images or Debian directly when I need userland pieces. At a previous job I had a Marvell MACCHIATObin available to me, which works out as a nice platform - quad core A72 @ 2GHz with 16GB RAM, proper SATA and a PCIe slot. However they re still a bit pricey for a casual home machine. I really like the look of the HoneyComb LX2 - 16 A72 cores, up to 64GB RAM - but it s even more expensive. So when I saw the existence of the 8GB Raspberry Pi 4 I was interested. Firstly, the Pi 4 is a proper 64 bit device (my existing Pi B is ARMv6 which means it needs to run Raspbian instead of native Debian armhf), capable of running an upstream kernel and unmodified Debian userspace. Secondly the Pi 4 has a USB 3 controller sitting on a PCIe bus rather than just the limited SoC USB 2 controller. It s not SATA, but it s still a fairly decent method of attaching some storage that s faster/more reliable than an SD card. Finally 8GB RAM is starting to get to a decent amount - for a headless build box 4GB is probably generally enough, but I wanted some headroom. The Pi comes as a bare board, so I needed a case. Ideally I wanted something self contained that could take the Pi, provide a USB/SATA adaptor and take the drive too. I came across the pre-order for the DeskPi Pro, decided it was the sort of thing I was after, and ordered one towards the end of September. It finally arrived at the start of December, at which point I got round to ordering a Pi 4 from CPC. Total cost ~ 120 for the case + Pi.

The Bad First, let s get the bad parts out of the way. Broken USB port (right) I managed to break a USB port on the Desk Pi. It has a pair of forward facing ports, I plugged my wireless keyboard dongle into it and when trying to remove it the solid spacer bit in the socket broke off. I ve never had this happen to me before and I ve been using USB devices for 20 years, so I m putting the blame on a shoddy socket. The first drive I tried was an old Crucial M500 mSATA device. I have an adaptor that makes it look like a normal 2.5 drive so I used that. Unfortunately it resulted in a boot loop; the Pi would boot its initial firmware, try to talk to the drive and then reboot before even loading Linux. The DeskPi Pro comes with an m2 adaptor and I had a spare m2 drive, so I tried that and it all worked fine. This might just be power issues, but it was an unfortunate experience especially after the USB port had broken off. (Given I ended up using an M.2 drive another case option would have been the Argon ONE M.2, which is a bit more compact.)

The Annoying DeskPi Pro without rear bezel The case is a little snug; I was worried I was going to damage things as I slid it in. Additionally the construction process is a little involved. There s a good set of instructions, but there are a lot of pieces and screws involved. This includes a couple of FFC cables to join things up. I think this is because they ve attempted to make a compact case rather than allowing a little extra room, and it does have the advantage that once assembled it feels robust without anything loose in it. DeskPi Pro with rear bezel and USB3 dongle I hate the need for an external USB3 dongle to bridge from the Pi to the USB/SATA adaptor. All the cases I ve seen with an internal drive bay have to do this, because the USB3 isn t brought out internally by the Pi, but it just looks ugly to me. It s hidden at the back, but meh. Fan control is via a USB/serial device, which is fine, but it attaches to the USB C power port which defaults to being a USB peripheral. Raspbian based kernels support device tree overlays which allows easy reconfiguration to host mode, but for a Debian based system I ended up rolling my own dtb file. I changed
#include "bcm283x-rpi-usb-peripheral.dtsi"
to
#include "bcm283x-rpi-usb-host.dtsi"
in arch/arm/boot/dts/bcm2711-rpi-4-b.dts and then I did:
cpp -nostdinc -I include -I arch -undef -x assembler-with-cpp \
    arch/arm/boot/dts/bcm2711-rpi-4-b.dts > rpi4.preprocessed
dtc -I dts -O dtb rpi4.preprocessed -o bcm2711-rpi-4-b.dtb
and the resulting bcm2711-rpi-4-b.dtb file replaced the one in /boot/firmware. This isn t a necessary step if you don t want to use the cooling fan in the case, or the front USB ports, and it s not really anyone s fault, but it was an annoying extra step to have to figure out. The DeskPi came with a microSD card that was supposed to have RaspiOS already on it. It didn t, it was blank. In my case that was fine, because I wanted to use Debian, but it was a minor niggle.

The Good I used Gunnar s pre-built Pi Debian image and it Just Worked; I dd d it to the microSD as instructed and the Pi 4 came up with working wifi, video and USB enabling me to get it configured for my network. I did an apt upgrade and got updated to the Buster 10.7 release, as well as the latest 5.9 backport kernel, and everything came back without effort after a reboot. It s lovely to be able to run Debian on this device without having to futz around with self-compiled kernels. The DeskPi makes a lot of effort to route things externally. The SD slot is brought out to the front, making it easy to fiddle with the card contents without having to open the case to replace it. All the important ports are brought out to the back either through orientation of the Pi, or extenders in the case. That means the built in Pi USB ports, the HDMI sockets (conveniently converted to full size internally), an audio jack and a USB-C power port. The aforementioned USB3 dongle for the bridge to the drive is the only external thing that s annoying. Thermally things seem good too. I haven t done a full torture test yet, but with the fan off the system is sitting at about 40 C while fairly idle. Some loops in bash that push load up to above 2 get the temperature up to 46 C or so, and turning the fan on brings it down to 40 C again. It s audible, but quieter than my laptop and not annoying. I liked the way the case came with everything I needed other than the Pi 4 and a suitable disk drive. There was an included PSU (a proper USB-C PD device, UK plug), the heatsink/fan is there, the USB/SATA converter is there and even an SD card is provided (though that s just because I had a pre-order). Speaking of the SD, I only needed it for initial setup. Recent Pi 4 bootloaders are capable of booting directly from USB mass storage devices. So I upgraded using the RPi EEPROM Recovery image (which just needs extracted to the SD FAT partition, no need for anything complicated - boot with it and the screen goes all green and you know it s ok), then created a FAT partition at the start of the drive for the kernel / bootloader config and a regular EXT4 partition for root. Copies everything over, updated paths, took out the SD and it all just works happily.

Summary My main complaint is the broken USB port, which feels like the result of a cheap connector. For a front facing port expected to see more use than the rear ports I think there s a reasonable expectation of robustness. However I m an early adopter and maybe future runs will be better. Other than that I m pretty happy. The case is exactly the sort of thing I wanted; I was looking for something that would turn the Pi into a box that can sit on my desk on the network and that I don t have to worry about knocking wires out of or lots of cables hooking bits up. Everything being included made it very convenient to get up and running. I still haven t poked the Pi that hard, but first impressions are looking good for it being a trouble free ARM64 dev box in the corner, until I can justify a HoneyComb.

5 December 2020

Thorsten Alteholz: My Debian Activities in November 2020

FTP master Unfortunately a day only has 24h. As the freeze is approaching, I had to concentrate a bit more on keeping my packages in shape. So this month I only accepted nine packages. The good news, I rejected no package. The overall number of packages that got accepted was 328. Debian LTS This was my seventy-seventh month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. This month my all in all workload has been 22.75h. During that time I did LTS uploads of: I also started to work on x11vnc and slirp. Last but not least I did some days of frontdesk duties. Debian ELTS This month was the twenty ninth ELTS month. During my allocated time I uploaded: Unfortunately I also had to give back some hours. Last but not least I did some days of frontdesk duties. Other stuff This month I uploaded new upstream versions of: I fixed one or two bugs in: I improved packaging of: and there have been even some new packages: As it is again this time of the year, I would also like to draw some attention to the Debian Med Advent Calendar. Like the past years, the Debian Med team starts a bug squashing event from the December 1st to 24th. Every bug that is closed will be registered in the calendar. So instead of taking something from the calendar, this special one will be filled and at Christmas hopefully every Debian Med related bug is closed. Don t hesitate, start to squash :-). The announcement on the mailing list can be found here.

29 September 2020

Mike Gabriel: UBports: Packaging of Lomiri Operating Environment for Debian (part 03)

Before and during FOSDEM 2020, I agreed with the people (developers, supporters, managers) of the UBports Foundation to package the Unity8 Operating Environment for Debian. Since 27th Feb 2020, Unity8 has now become Lomiri. Recent Uploads to Debian related to Lomiri Over the past 4 months I worked on the following bits and pieces regarding Lomiri in Debian: The next two big projects / packages ahead are lomiri-ui-toolkit and qtmir. Credits Many big thanks go to Marius and Dalton for their work on the UBports project and being always available for questions, feedback, etc. Thanks to Ratchanan Srirattanamet for providing some of his time for debugging some non-thread safe unit tests (currently unsure, what package we actually looked at...). Thanks for Florian Leeber for being my point of contact for topcis regarding my cooperation with the UBports Foundation. Previous Posts about my Debian UBports Team Efforts

2 September 2020

Elana Hashman: My term at the Open Source Initiative thus far

When I ran for the OSI board in early 2019, I set three goals for myself: Now that the OSI has announced hiring an interim General Manager, I thought it would be a good time to publicly reflect on what I've accomplished and what I'd like to see next. As I promised in my campaign pitch, I aim to be publicly accountable :) Growing the OSI's membership I have served as our Membership Committee Chair since the May 2019 board meeting, tasked with devising and supervising strategy to increase membership and deliver value to members. As part of my election campaign last year, I signed up over 50 new individual members. Since May 2019, we've seen strong 33% growth of individual members, to reach a new all-time high over 600 (638 when I last checked). I see the OSI as a relatively neutral organization that occupies a unique position to build bridges among organizations within the FOSS ecosystem. In order to facilitate this, we need a representative membership, and we need to engage those members and provide forums for cross-pollination. As Membership Committee Chair, I have been running quarterly video calls on Jitsi for our affiliate members, where we can share updates between many global organizations and discuss challenges we all face. But it's not enough just to hold the discussion; we also need to bring fresh new voices into the conversation. Since I've joined the board, I'm thrilled to say that 16 new affiliate members joined (in chronological order) for a total of 81: I was also excited to run a survey of the OSI's individual and affiliate membership to help inform the future of the organization that received 58 long-form responses. The survey has been accepted by the board at our August meeting and should be released publicly soon! Defending the Open Source Definition When I joined the board, the first committee I joined was the License Committee, which is responsible for running the licence review process, making recommendations on new licenses, and maintaining our existing licenses. Over the past year, under Pamela Chestek's leadership as Chair, the full board has approved the following licenses (with SPDX identifiers in brackets) on the recommendation of the License Committee: We withheld approval of the following licenses: I've also worked to define the scope of work for hiring someone to improve our license review process, which we have an open RFP for! Chopping wood and carrying water I joined the OSI with the goal of improving an organization I didn't think was performing up to its potential. Its membership and board were not representative of the wider open source community, its messaging felt outdated, and it seemed to be failing to rise to today's challenges for FOSS. But before one can rise to meet these challenges, you need a strong foundation. The OSI needed the organizational structure, health, and governance in order to address such questions. Completing that work is essential, but not exactly glamourous and it's a place that I thrive. Honestly, I don't (yet?) want to be the public face of the organization, and I apologize to those who've missed me at events like FOSDEM. I want to talk a little about some of my behind-the-scenes activities that I've completed as part of my board service: All of this work is intended to improve the organization's health and provide it with an excellent foundation for its mission. Defining the future of open source Soon after I was elected to the board, I gave a talk at Brooklyn.js entitled "The Future of Open Source." In this presentation, I pondered about the history and future of the free and open source software movement, and the ethical questions we must face. In my election campaign, I wrote "Software licenses are a means, not an end, to open source software. Focusing on licensing is necessary but not sufficient to ensure a vibrant, thriving open source community. Focus on licensing to the exclusion of other serious community concerns is to our collective detriment." My primary goal for my first term on the board was to ensure the OSI would be positioned to answer wider questions about the open source community and its future beyond licenses. Over the past two months, I supported Megan Byrd-Sanicki's suggestion to hold (and then participated in, with the rest of the board) organizational strategy sessions to facilitate our long-term planning. My contribution to help inform these sessions was providing the member survey on behalf of the Membership Committee. Now, I think we are much better equiped to face the hard questions we'll have to tackle. In my opinion, the Open Source Initiative is better positioned than ever to answer them, and I can't wait to see what the future brings. Hope to see you at our first State of the Source conference next week!

1 August 2020

Paul Wise: FLOSS Activities July 2020

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration
  • Debian wiki: unblock IP addresses, approve accounts, reset email addresses

Communication

Sponsors The purple-discord, ifenslave and psqlodbc work was sponsored by my employer. All other work was done on a volunteer basis.

16 July 2020

Louis-Philippe V ronneau: DebConf Videoteam Sprint Report -- DebConf20@Home

DebConf20 starts in about 5 weeks, and as always, the DebConf Videoteam is working hard to make sure it'll be a success. As such, we held a sprint from July 9th to 13th to work on our new infrastructure. A remote sprint certainly ain't as fun as an in-person one, but we nonetheless managed to enjoy ourselves. Many thanks to those who participated, namely: We also wish to extend our thanks to Thomas Goirand and Infomaniak for providing us with virtual machines to experiment on and host the video infrastructure for DebConf20. Advice for presenters For DebConf20, we strongly encourage presenters to record their talks in advance and send us the resulting video. We understand this is more work, but we think it'll make for a more agreeable conference for everyone. Video conferencing is still pretty wonky and there is nothing worse than a talk ruined by a flaky internet connection or hardware failures. As such, if you are giving a talk at DebConf this year, we are asking you to read and follow our guide on how to record your presentation. Fear not: we are not getting rid of the Q&A period at the end of talks. Attendees will ask their questions either on IRC or on a collaborative pad and the Talkmeister will relay them to the speaker once the pre-recorded video has finished playing. New infrastructure, who dis? Organising a virtual DebConf implies migrating from our battle-tested on-premise workflow to a completely new remote one. One of the major changes this means for us is the addition of Jitsi Meet to our infrastructure. We normally have 3 different video sources in a room: two cameras and a slides grabber. With the new online workflow, directors will be able to play pre-recorded videos as a source, will get a feed from a Jitsi room and will see the audience questions as a third source. This might seem simple at first, but is in fact a very major change to our workflow and required a lot of work to implement.
               == On-premise ==                                          == Online ==
                                                      
              Camera 1                                                 Jitsi
                                                                          
                 v                 ---> Frontend                         v                 ---> Frontend
                                                                                            
    Slides -> Voctomix -> Backend -+--> Frontend         Questions -> Voctomix -> Backend -+--> Frontend
                                                                                            
                 ^                 ---> Frontend                         ^                 ---> Frontend
                                                                          
              Camera 2                                           Pre-recorded video
In our tests, playing back pre-recorded videos to voctomix worked well, but was sometimes unreliable due to inconsistent encoding settings. Presenters will thus upload their pre-recorded talks to SReview so we can make sure there aren't any obvious errors. Videos will then be re-encoded to ensure a consistent encoding and to normalise audio levels. This process will also let us stitch the Q&As at the end of the pre-recorded videos more easily prior to publication. Reducing the stream latency One of the pitfalls of the streaming infrastructure we have been using since 2016 is high video latency. In a worst case scenario, remote attendees could get up to 45 seconds of latency, making participation in events like BoFs arduous. In preparation for DebConf20, we added a new way to stream our talks: RTMP. Attendees will thus have the option of using either an HLS stream with higher latency or an RTMP stream with lower latency. Here is a comparative table that can help you decide between the two protocols:
HLS RTMP
Pros
  • Can be watched from a browser
  • Auto-selects a stream encoding
  • Single URL to remember
  • Lower latency (~5s)
Cons
  • Higher latency (up to 45s)
  • Requires a dedicated video player (VLC, mpv)
  • Specific URLs for each encoding setting
Live mixing from home with VoctoWeb Since DebConf16, we have been using voctomix, a live video mixer developed by the CCC VOC. voctomix is conveniently divided in two: voctocore is the backend server while voctogui is a GTK+ UI frontend directors can use to live-mix. Although voctogui can connect to a remote server, it was primarily designed to run either on the same machine as voctocore or on the same LAN. Trying to use voctogui from a machine at home to connect to a voctocore running in a datacenter proved unreliable, especially for high-latency and low bandwidth connections. Inspired by the setup FOSDEM uses, we instead decided to go with a web frontend for voctocore. We initially used FOSDEM's code as a proof of concept, but quickly reimplemented it in Python, a language we are more familiar with as a team. Compared to the FOSDEM PHP implementation, voctoweb implements A / B source selection (akin to voctogui) as well as audio control, two very useful features. In the following screen captures, you can see the old PHP UI on the left and the new shiny Python one on the right. The old PHP voctowebThe new Python3 voctoweb Voctoweb is still under development and is likely to change quite a bit until DebConf20. Still, the current version seems to works well enough to be used in production if you ever need to. Python GeoIP redirector We run multiple geographically-distributed streaming frontend servers to minimize the load on our streaming backend and to reduce overall latency. Although users can connect to the frontends directly, we typically point them to live.debconf.org and redirect connections to the nearest server. Sadly, 6 months ago MaxMind decided to change the licence on their GeoLite2 database and left us scrambling. To fix this annoying issue, Stefano Rivera wrote a Python program that uses the new database and reworked our ansible frontend server role. Since the new database cannot be redistributed freely, you'll have to get a (free) license key from MaxMind if you to use this role. Ansible & CI improvements Infrastructure as code is a living process and needs constant care to fix bugs, follow changes in DSL and to implement new features. All that to say a large part of the sprint was spent making our ansible roles and continuous integration setup more reliable, less buggy and more featureful. All in all, we merged 26 separate ansible-related merge request during the sprint! As always, if you are good with ansible and wish to help, we accept merge requests on our ansible repository :)

9 June 2020

Molly de Blanc: Racism is a Free Software Issue

Racism is a free software issue. I gave a talk that touched on this at CopyLeft Conf 2019. I also talked a little bit about it at All Things Open 2019 and FOSDEM 2020 in my talk The Ethics Behind Your IoT. I know statistics, theory, and free software. I don t know about race and racism nearly as well. I might make mistakes I have made some and I will make more. Please, when I do, help me do better. I want to look at a few particular technologies and think about how they reinforce systemic racism. Worded another way: how is technology racist? How does technology hurt Black Indigenous People of Color (BIPOC)? How does technology keep us racist? How does technology make it easier to be racist?

Breathalyzers In the United States, Latinx folks are less likely to drink than white people and, overall, less likely to be arrested for DUIs3,4. However, they are more likely to be stopped by police while driving5,6. Who is being stopped by police is up to the police and they pull over a disproportionate number of Latinx drivers. After someone is pulled over for suspected drunk driving, they are given a breathalyzer test. Breathalyzers are so easy to (un)intentionally mis-calibrate that they have been banned as valid evidence in multiple states. The biases of the police are not canceled out by the technology that should, in theory, let us know whether someone is actually drunk.

Facial Recognition I could talk about for quite some time and, in fact, have. So have others. Google s image recognition software recognized black people as gorillas and to fix the issue it removed gorillas from it s image-labeling technology. Facial recognition software does a bad job at recognizing black people. In fact, it s also terrible at identifying indigenous people and other people of color. (Incidentally, it s also not great at recognizing women, but let s not talk about that right now.) As we use facial recognition technology for more things, from automated store checkouts (even more relevant in the socially distanced age of Covid-19), airport ticketing, phone unlocking, police identification, and a number of other things, it becomes a bigger problem that this software cannot tell the difference between two Asian people.

Targeted Advertising Black kids see 70% more online ads for food than white kids, and twice as many ads for junk food. In general BIPOC youth are more likely to see junk food advertisements online. This is intentional, and happens after they are identified as BIPOC youth.

Technology Reinforces Racism; Racism Builds Technology The technology we have developed reinforces racism on a society wide scale because it makes it harder for BIPOC people to interact with this world that is run by computers and software. It s harder to not be racist when the technology around us is being used to perpetuate racist paradigms. For example, if a store implements facial recognition software for checkout, black women are less likely to be identified. They are then more likely to be targeted as trying to steal from the store. We are more likely to take this to mean that black women are more likely to steal. This is how technology builds racism, People are being excluded largely because they are not building these technologies, because they are not welcome in our spaces. There simply are not enough Black and Hispanic technologists and that is a problem. We need to care about this because when software doesn t work for everyone, it doesn t work. We cannot build on the promise of free and open source software when we are excluding the majority of people.

7 June 2020

Enrico Zini: Cooperation links

This comic was based on this essay from Augusten Burroughs: How to live unhappily ever after. In addition to the essay, I highly recommend reading his books. It's also been described in psychology as flow.
With full English subtitles
Confict where it comes from and how to deal with it
Communication skills
Other languages
Distributed teams are where people you work with aren t physically co-located, ie. they re at another office building, home or an outsourced company abroad. They re becoming increasingly popular, for DevOps and other teams, due to recruitment, diversity, flexibility and cost savings. Challenges arise due to timezones, language barriers, cultures and ways of working. People actively participating in Open Source communities tend to be effective in distributed teams. This session looks at how to apply core Open Source principles to distributed teams in Enterprise organisations, and the importance of shared purposes/goals, (mis)communication, leading vs managing teams, sharing and learning. We'll also look at practical aspects of what's worked well for others, such as alternatives to daily standups, promoting video conferencing, time management and virtual coffee breaks. This session is relevant for those leading or working in distributed teams, wanting to know how to cultivate an inclusive culture of increased trust and collaboration that leads to increased productivity and performance.

Next.

Previous.